Energy Efficient Deep Learning

  1. Light-weighted network structure

    SqueezeNet, MobileNet, and ShuffleNet share the same idea: decouple the temporal convolution and spatial convolution to reduce the nummber of parameters, sharing the similar spirit with Pseudo-3D Residual Networks. SqueezeNet is serial while MobileNet and ShuffleNet are parrallel. MobileNet is a special case of ShuffleNet when using only one group.

    Low-rank approximation ($k\times k \times c\times d = k\times k\times c\times d’ + 1\times 1\times d’\times d$) also falls into the above scope. The difference between MobileNet and Low-rank approximation is layerwise convolution or not.

  2. Tweak network structure

    • prune nodes based on certain criteria (e.g., response value, Fisher information): require special implementation and take up more space than expected due to irregular network structure.
  3. Compress weights

    • Quantization (fixed bit number): learn codebook and encode weights. Fine-tune codebook after quantizatizing weights, which averages the gradient of weights belonging to the same cluster. Extreme cases are binary net and ternary net. Binary (resp, ternary) net are quantized to [-1, 1] (resp, [-1, 0, 1]), with different weights $\alpha$ for different layers.
    • Huffman Coding (flexible bit number): applied after quantization for further compression.
  4. Computation

    • spatial domain to frequency domain: convert convolution to pointwise multiplication by using FFT
  5. Sparsity regularization

  6. Efficient Inference

    • cascade of networks, early exit network (predict whether to exit or not after each layer) [1] [2]

Good introduction slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf